Minersoft: A Keyword-based Search Engine for Software Resources in Large-scale Grid Infrastructures∗

نویسندگان

  • Marios D. Dikaiakos
  • Asterios Katsifodimos
  • George Pallis
چکیده

We investigate the problem of supporting keyword-based searching for the discovery of software resources that are installed on the nodes of large-scale, federated Grid computing infrastructures. We address a number of challenges that arise from the unstructured nature of software and the unavailability of software-related metadata on Grid sites. We present Minersoft, a Grid harvester that visits Grid sites, crawls their file-systems, identifies and classifies software resources, and discovers implicit associations between them. The results of Minersoft harvesting are encoded in a weighted, typed graph, named the Software Graph. A number of IR algorithms are used to enrich this graph with structural and content associations, to annotate software resources with keywords, and build inverted indexes to support keyword-based searching for software. Using a real testbed, we present an evaluation study of our approach, using data extracted from a production-quality Grid infrastructure. Experimental results show that Minersoft is a powerful tool achieving high search efficiency.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles

When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...

متن کامل

Weighted-HR: An Improved Hierarchical Grid Resource Discovery

Grid computing environments include heterogeneous resources shared by a large number of computers to handle the data and process intensive applications. In these environments, the required resources must be accessible for Grid applications on demand, which makes the resource discovery as a critical service. In recent years, various techniques are proposed to index and discover the Grid resource...

متن کامل

A Hybrid Scavenger Grid Approach to Intranet Search

According to a 2007 global survey of 178 organisational intranets, 3 out of 5 organisations are not satisfied with their intranet search services. However, as intranet data collections become large, effective full-text intranet search services are needed more than ever before. To provide an effective full-text search service based on current information retrieval algorithms, organisations have ...

متن کامل

Grid Virtualization Engine: Providing Virtual Resources for Grid Infrastructure

Virtual machines offer a lot of advantage such as easy configuration and management and can simplify the development and the deployment of Grid infrastructures. Various virtualization implementations despite have similar functions often provide different management and access interfaces. The heterogeneous virtualization technologies bring challenges for employing virtual machine as computing re...

متن کامل

Toward a search architecture for software components

The Grid and its related technologies enable large-scale sharing of resources of various types. We envision that in the near future applications will be completely built in a bottom-up fashion using software components deployed on various locations and interconnected to form a workflow graph. In this paper, we make some proposals on the design of a component search service, enabling users to lo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009